SVM

Before moving forward with the to-do list, let’s throw a Random Forest to it.

SVM

For many reasons, Random Forest is usually a very good baseline model. In this particular case I started with the polynomial OLS as baseline model, just because it was so evident from the correlations that the relationship between temperature and consumption follows a polynomial shape. But let’s go back to a beloved RF.

/home/runner/work/strom/strom/.venv/lib/python3.10/site-packages/sklearn/svm/_base.py:1250: ConvergenceWarning:

Liblinear failed to converge, increase the number of iterations.

Model Cards provide a framework for transparent, responsible reporting. 
 Use the vetiver `.qmd` Quarto template as a place to start, 
 with vetiver.model_card()
Writing pin:
Name: 'wd-svm'
Version: 20251122T162915Z-1869e
♻️  stepit 'svm_raw': is up-to-date. Using cached result for `strom.modelling.assess_model()` 2025-11-22 16:29:15

Metrics

Single Split CV
train test test train
MAE - Mean Absolute Error 2.432282 2.419931 2.673258 3.329209
MSE - Mean Squared Error 18.289605 22.503780 24.400576 23.520673
RMSE - Root Mean Squared Error 4.276635 4.743815 3.990919 4.776235
R2 - Coefficient of Determination 0.805274 0.749650 0.301124 0.760764
MAPE - Mean Absolute Percentage Error 0.193486 0.200061 0.203410 0.343579
EVS - Explained Variance Score 0.824572 0.789548 0.611400 0.820277
MeAE - Median Absolute Error 1.566992 1.308510 1.987998 2.657292
D2 - D2 Absolute Error Score 0.648420 0.643352 0.214133 0.531369
Pinball - Mean Pinball Loss 1.216141 1.209965 1.336629 1.664605

Scatter plot matrix

Observed vs. Predicted and Residuals vs. Predicted

Check for …

check the residuals to assess the goodness of fit.

  • white noise or is there a pattern?
  • heteroscedasticity?
  • non-linearity?

Normality of Residuals:

Check for …

  • Are residuals normally distributed?

Leverage

Scale-Location plot

Residuals Autocorrelation Plot

Residuals vs Time

Well, not that bad, but it is overfitting quite a lot.

♻️  stepit 'grid_search_pipe': is up-to-date. Using cached result for `strom.modelling.grid_search_pipe()` 2025-11-22 16:29:19

Model Cards provide a framework for transparent, responsible reporting. 

 Use the vetiver `.qmd` Quarto template as a place to start, 

 with vetiver.model_card()

Writing pin:

Name: 'wd-svm'

Version: 20251122T162919Z-50658
♻️  stepit 'svm_tuned': is up-to-date. Using cached result for `strom.modelling.assess_model()` 2025-11-22 16:29:19

Metrics

Single Split CV
train test test train
MAE - Mean Absolute Error 2.247125 2.106274 2.047927 2.378226
MSE - Mean Squared Error 16.223860 19.443720 19.789512 16.932822
RMSE - Root Mean Squared Error 4.027885 4.409503 3.365674 4.111345
R2 - Coefficient of Determination 0.827268 0.783693 0.583568 0.826061
MAPE - Mean Absolute Percentage Error 0.184460 0.203015 0.152578 0.185927
EVS - Explained Variance Score 0.827974 0.791971 0.657677 0.827940
MeAE - Median Absolute Error 1.332851 1.088117 1.345858 1.470077
D2 - D2 Absolute Error Score 0.675184 0.689578 0.402628 0.663515
Pinball - Mean Pinball Loss 1.123562 1.053137 1.023964 1.189113

Scatter plot matrix

Observed vs. Predicted and Residuals vs. Predicted

Check for …

check the residuals to assess the goodness of fit.

  • white noise or is there a pattern?
  • heteroscedasticity?
  • non-linearity?

Normality of Residuals:

Check for …

  • Are residuals normally distributed?

Leverage

Scale-Location plot

Residuals Autocorrelation Plot

Residuals vs Time

TODOs